Toward upgrades-as-a-service in distributed systems

نویسندگان

  • Tudor Dumitras
  • Priya Narasimhan
چکیده

Unavailability in distributed enterprise systems is usually the result of planned events, such as upgrades, rather than failures. Major system upgrades entail complex data conversions that are difficult to perform on the fly, in the face of live workloads. Minimizing the downtime imposed by such conversions is a time-intensive and error-prone manual process. We propose upgrades-as-a-service, a novel approach that can eliminate all the causes of planned downtime recorded during the upgrade history of one of the ten most popular websites. Building on the lessons learned from past research on live upgrades in middleware systems, upgrades-as-a-service trade off a need for additional hardware resources during the upgrade for the ability to perform end-to-end upgrades online, with minimal application-specific knowledge.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modular Software Upgrades for Distributed Systems

Upgrading the software of long-lived, highly-available distributed systems is difficult. It is not possible to upgrade all the nodes in a system at once, since some nodes may be unavailable and halting the system for an upgrade is unacceptable. Instead, upgrades must happen gradually, and there may be long periods of time when different nodes run different software versions and need to communic...

متن کامل

Eternal: Fault Tolerance and Live Upgrades for Distributed Object Systems

The Eternal system supports distributed object applications that must operate continuously, without interruption of service, despite faults and despite upgrades to the hardware and the software. Based on the CORBA distributed object computing standard, the Eternal system replicates objects, invisibly and consistently, so that if one replica of an object fails, or is being upgraded, another repl...

متن کامل

Improving the Dependability of Distributed Systems through AIR Software Upgrades

Traditional fault-tolerance mechanisms concentrate almost entirely on responding to, avoiding, or tolerating unexpected faults or security violations. However, scheduled events, such as software upgrades, account for most of the system unavailability and often introduce data corruption or latent errors. Through two empirical studies, this dissertation identifies the leading causes of upgrade fa...

متن کامل

Automatic software upgrades for distributed systems

Upgrading the software of long-lived distributed systems is difficult. It is not possible to upgrade all the nodes in a system at once, since some nodes may be down and halting the system for an upgrade is unacceptable. This means that different nodes may be running different software versions and yet need to communicate, even though those versions may not be fully compatible. We present a meth...

متن کامل

Why Do Upgrades Fail and What Can We Do about It? Toward Dependable, Online Upgrades in Enterprise Systems

Enterprise-system upgrades are unreliable and often produce downtime or data-loss. Errors in the upgrade procedure, such as broken dependencies, constitute the leading cause of upgrade failures. We propose a novel upgradecentric fault model, based on data from three independent sources, which focuses on the impact of procedural errors rather than software defects. We show that current approache...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009